Goto

Collaborating Authors

 auxiliary task


Expectation Error Bounds for Transfer Learning in Linear Regression and Linear Neural Networks

Liu, Meitong, Jung, Christopher, Li, Rui, Feng, Xue, Zhao, Han

arXiv.org Machine Learning

In transfer learning, the learner leverages auxiliary data to improve generalization on a main task. However, the precise theoretical understanding of when and how auxiliary data help remains incomplete. We provide new insights on this issue in two canonical linear settings: ordinary least squares regression and under-parameterized linear neural networks. For linear regression, we derive exact closed-form expressions for the expected generalization error with bias-variance decomposition, yielding necessary and sufficient conditions for auxiliary tasks to improve generalization on the main task. We also derive globally optimal task weights as outputs of solvable optimization programs, with consistency guarantees for empirical estimates. For linear neural networks with shared representations of width $q \leq K$, where $K$ is the number of auxiliary tasks, we derive a non-asymptotic expectation bound on the generalization error, yielding the first non-vacuous sufficient condition for beneficial auxiliary learning in this setting, as well as principled directions for task weight curation. We achieve this by proving a new column-wise low-rank perturbation bound for random matrices, which improves upon existing bounds by preserving fine-grained column structures. Our results are verified on synthetic data simulated with controlled parameters.


AuxiliaryTaskReweightingfor Minimum-dataLearning

Neural Information Processing Systems

Supervised learning requires a large amount of training data, limiting its application where labeled data is scarce. To compensate for data scarcity, one possible method is to utilize auxiliary tasks to provide additional supervision for the main task. Assigning and optimizing the importance weights for different auxiliary tasks remains an crucial and largely understudied research question. In this work, we propose a method to automatically reweight auxiliary tasks in order to reduce the data requirement on the main task. Specifically, we formulate the weighted likelihood function of auxiliary tasks as a surrogate prior for the main task. By adjusting the auxiliary task weights to minimize the divergence between the surrogate prior and the true prior ofthe main task, we obtain amore accurate prior estimation, achieving the goal of minimizing the required amount of training data for the main task and avoiding a costly grid search.




Revisiting Multi-Task Learning with ROCK: a Deep Residual Auxiliary Block for Visual Detection

Taylor Mordan, Nicolas THOME, Gilles Henaff, Matthieu Cord

Neural Information Processing Systems

Extensiveexperiments on NYUv2 dataset (object detection with scene classification, depth prediction, and surface normal estimation as auxiliary tasks) validate the relevance of the approach and its superiority to flat MTL approaches.


Self-Supervised Generalisation with Meta Auxiliary Learning

Shikun Liu, Andrew Davison, Edward Johns

Neural Information Processing Systems

We showthatourproposedmethod,MetaAuXiliaryLearning(MAXL),outperforms single-task learning on 7 image datasets, without requiring any additional data. We also show that MAXL outperforms several other baselines for generating auxiliary labels, and is even competitive when compared with human-defined auxiliary labels. The self-supervised nature of our method leads to a promising new direction towards automated generalisation. Source code can be found at https://github.com/lorenmt/maxl.


Appendixfor " Weakly-SupervisedMulti-GranularityMapLearningfor Vision-and-LanguageNavigation "

Neural Information Processing Systems

In our experiments, the fine-grained map, global semantic map, and multi-granularity map are of different sizes (asshowninFigure A)forsaving GPU memory. Object categories predicted by hallucination module. We use an Adam optimizer with a learning rate of 2.5e-4. Specifically,we consider the 10% area with 2 the highest probability in 2D distributionP and ˆP (as described in Section 3.3) as ground-truth andpredicted locations. From Table 1,this variant performs worse than our agent.